Text Summarization using Random Indexing and PageRank

نویسندگان

  • Pär Gustavsson
  • Arne Jönsson
چکیده

We present results from evaluations of an automatic text summarization technique that uses a combination of Random Indexing and PageRank. In our experiments we use two types of texts: news paper texts and government texts. Our results show that text type as well as other aspects of texts of the same type influence the performance. Combining PageRank and Random Indexing provides the best results on government texts. Adapting a text summarizer for a particular genre can improve text summarization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Random Indexing and Modified Random Indexing based approach for extractive text summarization

Random Indexing based extractive text summarization has already been proposed in literature. This paper looks at the above technique in detail, and proposes several improvements. The improvements are both in terms of formation of index (word) vectors of the document, and construction of context vectors by using convolution instead of addition operation on the index vectors. Experiments have bee...

متن کامل

Enhancing extraction based summarization with outside word space

We present results from improving vector space based extraction summarizers. The summarizer uses Random Indexing and Page Rank to extract those sentences whose importance are ranked highest for a document, based on vector similarity. Originally the summarizer used only word vectors based on the words in the document to be summarized. By using a larger word space model the performance of the sum...

متن کامل

Text Summarization Using Cuckoo Search Optimization Algorithm

Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...

متن کامل

CLAIRLIB Documentation v1.03

The Clair library is intended to simplify a number of generic tasks in Natural Language Processing (NLP), Information Retrieval (IR), and Network Analysis. Its architecture also allows for external software to be plugged in with very little effort. Functionality native to Clairlib includes Tokenization, Summarization, LexRank, Biased LexRank, Document Clustering, Document Indexing, PageRank, Bi...

متن کامل

Automatic summarization as means of simplifying texts, an evaluation for Swedish

We have developed an extraction based summarizer based on a word space model and PageRank and compared the readability of the resulting summaries with the original text, using various measures for Swedish and texts from different genres. The measures include among others readability index (LIX), nominal ratio (NR) and word variation index (OVIX). The measures correspond to the vocabulary load, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010